Machine learning-based integration identifies the ferroptosis hub genes in nonalcoholic steatohepatitis

Background Ferroptosis, is characterized by lipid peroxidation of fatty acids in the presence of iron ions, which leads to cell apoptosis. This leads to the disruption of metabolic pathways, ultimately resulting in liver dysfunction. Although ferroptosis is linked to nonalcoholic steatohepatitis (NASH), understanding the key ferroptosis-related genes (FRGs) involved in NASH remains incomplete. NASH may be targeted therapeutically by identifying the genes responsible for ferroptosis. Methods To identify ferroptosis-related genes and develop a ferroptosis-related signature (FeRS), 113 machine-learning algorithm combinations were used. Results The FeRS constructed using the Generalized Linear Model Boosting algorithm and Gradient Boosting Machine algorithms exhibited the best prediction performance for NASH. Eight FRGs, with ZFP36 identified by the algorithms as the most crucial, were incorporated into in FeRS. ZFP36 is significantly enriched in various immune cell types and exhibits significant positive correlations with most immune signatures. Conclusion ZFP36 is a key FRG involved in NASH pathogenesis. Supplementary Information The online version contains supplementary material available at 10.1186/s12944-023-01988-9.


Introduction
There is mounting evidence implicating non-alcoholic fatty liver disease (NAFLD) result in persistent liver damage [1,2].The increasing incidence of NAFLD affects human survival time [3].In the early stages, NAFLD manifests as nonalcoholic fatty liver disease [4]; nevertheless, advanced-stage NAFLD is characterized by necroinflammatory responses and increased hepatic fat accumulation.NASH can result from the accumulation of hepatic fat and the infiltration of inflammatory cytokines [5].NASH liver cells can develop fibrosis, hardening, and even cancer if no intervention is performed [6]; thus, early identification of NASH is crucial.NASH is traditionally diagnosed using a liver biopsy [7], but its invasive nature limits its clinical application.Another challenge in the management of NASH is the lack of effective treatments.Although lifestyle modifications and increased physical activity are considered primary therapeutic measures for NASH, research suggests that their outcomes are suboptimal [8].Thus, the search for therapeutic targets in NASH remains a critical endeavor.
Ferroptosis entails the activation of reactive oxygen species (ROS) and lipid peroxidation by iron ions [9].Ferroptosis likely contributes to the development of NASH because of the significant role of the liver in storing iron ions and regulating lipid metabolism [10].NASH treatment may be enhanced by understanding the core ferroptosis genes in patients with NASH.
In machine learning, predictive models are constructed by identifying complex patterns in datasets or by correlating data with predictions [11].This analytical discipline is useful for the development of optimal predictive models for disease diagnosis.Machine learning has been shown to provide robust risk models within clinical datasets, enabling the delineation of distinct patient cohorts based on clinical data [12,13].
This study hypothesises that the innovative integration of multiple machine learning algorithms to identify FRGs would develop a non-invasive diagnostic model to better serve patients with NASH.

Obtaining GEO data
Seven cohorts containing information from NASH patients (GSE130970, GSE89632, GSE61260, GSE126848, GSE48452, GSE164760, and GSE63067) were selected from the GEO database (https:// www.ncbi.nlm.nih.gov/ geo/).Gene matrices were calculated using the platform files associated with each GEO dataset.The following table presents the platform file information for the Fig. 1 The flow chart of the research seven GEO datasets.A merged dataset was created by merging the GSE130970 and GSE89632 datasets, after removing batch effects.The remaining five datasets were considered the external validation sets.Additionally, to increase the number of external validation sets, two additional validation cohorts (all-testing set cohorts and sample cohorts) were created (Supplementary Table 1).

Constructing a ferroptosis-related signature (FeRS)
Twelve machine learning algorithms were selected for this study for the binary classification variables.The Enet, Ridge, Stepglm, and Least Absolute Shrinkage and Selection Operator (LASSO) algorithms are typically applied for regression problems, particularly when there are many features, to select the most important features and reduce the risk of overfitting.The Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), glmBoost, Partial Least Squares Regression Generalized Linear Model (plsRglm), Random Forest, GBM, extreme Gradient Boosting (XGBoost), and Naive Bayes are all suitable for classification tasks in this study.A total of 113 algorithms were developed from the 12 machine learning Fig. 2 The process of data integration.A-B Statistics on the number of samples.C, D Before and after sample integration.E-F Principal component analysis (PCA) plots before and after the removal of the batch effect algorithms.Two measures are implemented to mitigate the risk of overfitting.A cross-validation approach was used in the first step, with one algorithm for variable selection and the other for constructing the classification prediction model for each combination algorithm.Another approach involves increasing the number of validation datasets.
The areas under the receiver operating characteristic (ROC) curves were evaluated for the 113 machine-learning algorithms in eight cohorts.A heat map was used to show the results of the model.The best algorithm was selected from among 113 algorithms.Model genes filtered using the best algorithm were used to calculate the ROC values for diagnosing NASH in the eight cohorts.

Pathway enrichment and immune infiltration analysis
GSVA analysis identified significantly enriched biological pathways [16].Additionally, it was used to explore enriched biological pathways when the model genes were upregulated or downregulated.Differences in the enrichment of immune features and immune cells were investigated in model genes.
GENEMANIA was applied to identify genes that interact with the model genes [17].Friendship analysis is a network topology-based analytical method used to explore interactions and relationships between genes or proteins in biological processes [18].It relies on semantic similarity measurements based on GO, which means that associations are analyzed using the functional information of genes.Friendship analysis can help select the most significant genes from a large set of DEGs.Therefore, it was employed to evaluate the most crucial genes among the model genes.

Single-cell analysis
The GSE159977 and GSE129516 datasets were downloaded.The former was separated into two groups, GSE159977(1) and GSE159977(2), each containing five human NASH samples.GSE129516 consists of three NASH mouse samples.To analyze the GSE159977(1), GSE159977(2), and GSE129516 datasets, the "Seurat" package was applied separately.First, the "PercentageFeatureSet()" function and all gene sets starting with MT-for mitochondrial quality control analysis were used to assess the proportion of mitochondrial genes in each cell to identify and exclude low-quality cells.To determine which features within the dataset displayed a high level of inter-cell variability, gene expression values were normalized using the "LogNormalize" method.For each dataset, 1500 features were selected for the PCA dimensionality reduction analysis, and the "JackStrawPlot()" and "ElbowPlot()" functions were used to determine the dimensionality of the dataset.Subsequently, cell clustering was performed, and the results were visualized using either the t-distributed stochastic neighborhood embedding method or the uniform surface approximation and projection method.Finally, the "SingleR" package was used to annotate different clusters as distinct immune cells.Additionally, differences in the expression of model genes among different immune cells were determined.

Quantitative real-time polymerase chain reaction (qPCR)
Eleven pairs of normal and NASH liver tissues (NASH score > 5) were collected from the hospital, and qPCR was performed.The RNA was extracted from the samples and then reverse-transcribed into cDNA.Relative expression of the model genes was standardized with GAPDH and Primers and reagents used are showed in Supplementary Table 2.

Statistical analysis
The Mann-Whitney U-test was applied to compare the two subgroups that did not satisfy the t-test.Statistical significance was set at P < 0.05.3. The concept of the full text is presented in Fig. 1.

Cohort integration
Healthy and NASH samples from seven datasets were analyzed (Fig. 2A,B).After removing the batch effect, GSE130970 and GSE89632 were combined to create a training set (Fig. 2C-F).

Sixteen ferroptosis-related DEGs associated with metabolism
Four hundred seventy five DEGs were observed.A total of 271 genes were highly expressed in NASH, whereas the opposite was true for 204 genes (Fig. 3A, B).Sixteen ferroptosis-related DEGs were identified when these 475 DEGs intersected with 484 FRGs (Fig. 3C).Notably, in the NASH group, six genes were significantly upregulated, while 10 genes were significantly opposite (Fig. 3D).
These 16 genes were significantly enriched in pathways associated with "alpha-linolenic acid metabolism, " "adipocyte differentiation, " "regulation of adipocyte differentiation, " and "circadian rhythm" (Fig. 3E, F).Consequently, the metabolic pathways are intertwined with NASH development in a complex manner.There was also a significant association between these genes and the etiologies of NASH, such as "fatty liver disease, " "lipid metabolism abnormalities, " and "insulin resistance" (Fig. 3G).

Enrichment patterns in NASH
A better understanding of the pathogenesis of NASH was obtained by comparing the enriched biological pathways."Cysteine and methionine metabolism, " "circadian rhythm, " "JAK-STAT signaling pathway, " "phenylalanine metabolism, " and "adipocytokine signaling pathway" exhibited significant upregulation in the NASH group (Fig. 8A).

Enrichment patterns for 8 model genes
An in-depth exploration of the biological significance of eight model genes was conducted.Twenty genes were closely associated with these eight genes (Fig. 9A).These 20 genes were significantly enriched in "PID AP1 pathway, " "PID P53 downstream pathway, " and "regulation of lipid metabolic processes" (Fig. 9B), as well as in the "MAPK family signaling cascades" associated with inflammation.Subsequently, their interactions were analyzed further in this study (Fig. 9C).ZFP36 ranked as the most important in FeRS, contrasting with FADS2 expression, yet showing synergy with IL6 expression (Fig. 9D).Notably, the downregulation of ZFP36 and IL6 suppressed the immune infiltration status of NASH, whereas the opposite was true for GRIA3 and FADS2 (Fig. 9E).

Single cell analysis
A significant enrichment of these eight model genes was observed in the different single-cell datasets.The GSE159977(1) cohort was annotated using "SingleR" into 4 immune cell types (Fig. 10A).ZFP36 was significantly enriched in all four immune cell types, whereas NR4A1 was enriched only in NK cells, monocytes, and B cells (Fig. 10B and C).The remaining six genes were not significantly enriched.The GSE159977(2) cohort was similarly annotated using "SingleR" (Fig. 10D).By contrast, ZFP36 was significantly enriched in all four of the immune cell types, whereas NR4A1 was enriched only in monocytes (Fig. 10E and F).This suggests that ZFP36 was highly expressed in multiple immune cell types, whereas NR4A1 was primarily upregulated in monocytes.To validate this, we used the mouse single-cell data GSE129516, which was annotated using "SingleR" into 12 cell types (Fig. 10G).ZFP36 was significantly upregulated in B cells, dendritic cells, NK cells, and macrophages (Fig. 10H and I).There was an obvious enrichment of NR4A1 in NK cells and macrophages, whereas FABP4

Identification of translation regulators
Forty-four TFs and 23 miRNAs were identified (Fig. 11A-B), which likely influenced the development of NASH by regulating the translation of the model genes.In addition, 60 drugs targeting the five model genes were identified (Fig. 11C).This indicates the possibility of curing NASH.

Patterns of gene expression
The genes were validated at the mRNA level using clinical samples.qPCR revealed the upregulation of GRIA3 and PRKAA2 and the downregulation of DHODH in patients with NASH (Fig. 12a-c).

Discussion
To date, NASH has remained a challenge that has needed to be addressed [33].Determining the drivers of inflammation is a major problem [34].Because the liver is a major site for iron ion metabolism, ferroptosis is thought to be involved in the development of NASH [35].Therefore, studies on ferroptosis may contribute to the management of NASH [36].
The 16 FRGs identified in this study are widely implicated in various metabolic pathways, such as "metabolism of alpha-linolenic acid, " "fat cell differentiation, " "regulation of fat cell differentiation, " and "circadian rhythm genes." Notably, these genes have been linked to the causes of NASH, such as "fatty liver disease, " "dyslipidemias, " "insulin resistance, " "gestational diabetes, " and "unipolar depression" [37][38][39][40][41]. Sixteen FRGs were processed further.The "glmBoost+GBM" combination algorithm was optimal, which developed a FeRS with 8 genes.This predictive model diagnosed NASH with an ROC value higher than 0.80 in eight datasets.FeRS diagnosed NASH well in all eight datasets compared to other diagnostic markers for NASH, demonstrating the strong generalizability of the model.
A deeper examination of these eight model genes has revealed that ZFP36 ranked first in importance in patients with NASH and was significantly downregulated (P < 0.001).ZFP36 is implicated in the "JAK-STAT signaling pathway" [42] and "TGF-β signaling pathway, " which are closely associated with inflammation regulation [43].ZFP36 is also associated with immune infiltration in NASH and its expression levels affect most immune signatures and cells.In particular, ZFP36 is positively associated with regulatory T cells and has anti-inflammatory effects.This implies that low ZFP36 expression in patients with NASH may promote the progression of inflammation by suppressing immune infiltration.
TFs can influence gene expression by regulating posttranscriptional translational processes [44].Five ZFP36 TFs were identified in this study.This discovery facilitates further understanding of the specific mechanism of ZFP36 effects in NASH.Another microRNAs (miRNAs) that post-transcriptionally regulates gene expression was also analyzed [45].It also inhibits translation and promotes RNA degradation [46].Eight miRNAs bound to ZFP36 were identified.They may have essential functions in the promotion of NASH progression by ZFP36, affecting both metabolic and inflammatory pathways [47][48][49].Additionally, the highly expressed genes GRIA3, FABP4, FADS2, and PRKAA2 have been implicated in lipid metabolism and liver fibrosis.They not only inhibit the anti-inflammatory effects of regulatory T cells but also promote the pro-inflammatory effects of M1 macrophages [50].In contrast, low expression of NR4A1, IL6, and DHODH involves key pathways such as "glycosphingolipid biosynthesis, " "circadian rhythms" and "type 2 diabetes, " activating pro-inflammatory M1 macrophages.

Study strengths and limitations
The advantage of this study is that compared to liver puncture biopsy, the FeRS not only can be applied conveniently and quickly on a large scale, but also avoids the great risks and pains that invasive operations bring to patients.However, the limitation is that NASH is a heterogeneous disease and the manifestations can vary according to individual differences.The FeRS may not be able to fully capture all potential variants, resulting in limited accuracy for different subtypes of NASH.

Conclusions
In this study, an FeRS was constructed and eight FRGs were identified.This model provides a noninvasive diagnostic method for NASH.Utilizing this noninvasive model helps to avoid complications associated with liver biopsy.Among the eight genes, ZFP36 ranked first.The downregulation of ZFP36 in NASH is not only involved in the "JAK-STAT pathway" and "TGFβpathway" associated with inflammation regulation but also inhibits immune infiltration and promotes inflammation progression.The identification of ZFP36 has positive implications for the treatment of NASH, and its activation may inhibit NASH progression.

Fig. 3
Fig. 3 It was determined that 16 genes were involved.(A) Across the two groups, 475 differential genes (DEGs) were found.(B) Expression landscape of DEGs.(C) Sixteen common genes were recognized.(D) Expression patterns of the genes.(E, F) Biological pathway enrichment analysis.(G) Disease enrichment analysis

Fig. 4
Fig. 4 A ferroptosis-related signature (FeRS) of nonalcoholic steatohepatitis (NASH) was developed.A The ROC values for diagnosing NASH using the 113 algorithms on the eight sets are shown in a heatmap.B-I Based on the eight model genes and FeRS, ROC values have been calculated for eight sets of NASH diagnoses

Fig. 5
Fig. 5 AUC values are shown in bar graphs.A-H The ROC values for diagnosing NASH using FeRS and 51 genes across 8 cohorts were calculated and compared

Fig. 10
Fig. 10 Single-cell annotations.a Four immune cells were annotated in GSE159977(1).b-c Enrichment landscapes of the model genes in immune cells.d Four immune cells were annotated in GSE159977(2).e-fEnrichment landscape of the model genes in immune cells.g Twelve immune cells were annotated in GSE129516.h-i Enrichment landscape of model genes in 12 types of cells

Fig. 11
Fig. 11 Identification of TFs, miRNAs, and drugs.a Forty-four TFs regulate the translation process of the model genes.b Twenty-three miRNAs regulate the translation process of the model genes.c Predicted 60 therapeutic drugs for NASH